Self-Training PCFG Grammars with Latent Annotations Across Languages

نویسندگان

  • Zhongqiang Huang
  • Mary P. Harper
چکیده

We investigate the effectiveness of selftraining PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak’s lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from selftraining. We show for the first time that self-training is able to significantly improve the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled training data. Our approach achieves stateof-the-art parsing accuracies for a single parser on both English (91.5%) and Chinese (85.2%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature-Rich Log-Linear Lexical Model for Latent Variable PCFG Grammars

Context-free grammars with latent annotations (PCFG-LA) have been found to be effective for parsing many languages; however, currently their lexical model may be subject to over-fitting and requires language engineering to handle out-ofvocabulary (OOV) words. Inspired by previous studies that have incorporated rich features into generative models, we propose to use a feature-rich log-linear lex...

متن کامل

Parsing low-resource languages using Gibbs sampling for PCFGs with latent annotations

PCFGs with latent annotations have been shown to be a very effective model for phrase structure parsing. We present a Bayesian model and algorithms based on a Gibbs sampler for parsing with a grammar with latent annotations. For PCFG-LA, we present an additional Gibbs sampler algorithm to learn annotations from training data, which are parse trees with coarse (unannotated) symbols. We show that...

متن کامل

Large-Scale Corpus-Driven PCFG Approximation of an HPSG

We present a novel corpus-driven approach towards grammar approximation for a linguistically deep Head-driven Phrase Structure Grammar. With an unlexicalized probabilistic context-free grammar obtained by Maximum Likelihood Estimate on a largescale automatically annotated corpus, we are able to achieve parsing accuracy higher than the original HPSG-based model. Different ways of enriching the a...

متن کامل

Probabilistic CFG with Latent Annotations

This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. This model is an extension of PCFG in which non-terminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFG-LA model using an EM-algorithm. Because exact parsing with a PCFG-LA is NP-hard, several approximations are describe...

متن کامل

Parsing German Topological Fields with Probabilistic Context-Free Grammars

Parsing German Topological Fields with Probabilistic Context-Free Grammars Jackie Chi Kit Cheung M. Sc. Graduate Department of Computer Science University of Toronto 2009 Syntactic analysis is useful for many natural language processing applications requiring further semantic analysis. Recent research in statistical parsing has produced a number of highperformance parsers using probabilistic co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009